{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 第8章-提升方法-前向分步算法\n",
    "\n",
    "&emsp;&emsp;本节介绍当使用前向分步算法，求解一个以指数函数为损失函数的加法模型时，得到的结果与AdaBoost算法之间的关系。书中在8.3节已经给出了结论，这两个算法是等价的，现求证该结论。  \n",
    "**证明：**  \n",
    "### $G^*$求解\n",
    "前向分步算法学习的是加法模型$\\displaystyle f(x)=\\sum_{m=1}^M \\alpha_m G_m(x)$，其中$G_m(x)$是基本分类器，取值为$\\{+1, -1\\}$，损失函数为$L(y, f(x)) = \\exp (-y f(x)) $，$y \\in \\{+1, -1\\}$，在该推导过程中，使用的是数学归纳法，已经得到了第$m-1$步的分类器$\\displaystyle f_{m-1}(x) = \\sum_{j=1}^{m-1} \\alpha_j G_j(x)$，求解$f_m(x) = f_{m-1}(x) + \\alpha_m G_m(x)$  \n",
    "$\\displaystyle \\alpha_m, G_m = \\mathop{\\arg \\min} \\limits_{\\alpha, G} \\sum_{i=1}^N \\exp \\Big\\{ -y_i \\big[ f_{m-1}(x_i) + \\alpha G(x_i) \\big] \\Big\\}$  \n",
    "令$\\bar{w}_{m,i} = \\exp\\big(-y_i f_{m-1}(x_i)\\big)$  \n",
    "$\\displaystyle \\therefore \\mathop{\\arg \\min} \\limits_{\\alpha, G} \\sum_{i=1}^N \\exp \\Big\\{ -y_i \\big[ f_{m-1}(x_i) + \\alpha G(x_i) \\big] \\Big\\} = \\mathop{\\arg \\min} \\limits_{\\alpha, G} \\sum_{i=1}^N \\bar{w}_{m,i} \\exp(- y_i \\alpha G(x_i))$，其中$\\bar{w}_{m,i}$与$\\alpha$和$G$无关，但$\\bar{w}_{m,i}$依赖$f_{m-1}(x)$，随着每一轮迭代而发生变化。  \n",
    "因为$y_i G(x_i)$取值为$\\{+1,-1\\}$，将$\\displaystyle \\sum_{i=1}^N \\bar{w}_{m,i} \\exp(- y_i \\alpha G(x_i))$拆开，可得：  \n",
    "$$\\begin{array}{ll} {} & \\displaystyle \\sum_{i=1}^N \\bar{w}_{m,i} \\exp(- y_i \\alpha G(x_i)) \\\\\n",
    "=& \\displaystyle \\sum_{i \\in M_1} \\bar{w}_{m,i} \\exp(-\\alpha) + \\sum_{i \\in M_2} \\bar{w}_{m,i} \\exp(\\alpha) \\\\  \n",
    "=& \\displaystyle \\sum_{i \\in M_1} \\bar{w}_{m,i} \\exp(-\\alpha) + \\sum_{i \\in M_2} \\bar{w}_{m,i} \\exp(-\\alpha) + \\sum_{i \\in M_2} \\bar{w}_{m,i} \\big[\\exp(\\alpha) - \\exp(-\\alpha) \\big] \\\\\n",
    "=& \\displaystyle \\exp (-\\alpha) \\sum_i \\bar{w}_{m,i} + \\big[\\exp(\\alpha) - \\exp(-\\alpha) \\big] \\sum_{i \\in M_2} \\bar{w}_{m,i} \\\\\n",
    "=& \\displaystyle \\exp (-\\alpha) \\sum_i \\bar{w}_{m,i} + \\big[\\exp(\\alpha) - \\exp(-\\alpha) \\big] \\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i))\n",
    "\\end{array}$$，其中$M_1$为正类集合，$M_2$为负类集合。  \n",
    "$\\therefore G_m^* = \\mathop{\\arg \\min} \\limits_{G} \\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i))$  \n",
    "\n",
    "### $\\alpha^*$求解\n",
    "$\\displaystyle \\alpha_m = \\mathop{\\arg \\min} \\limits_{\\alpha} \\sum \\bar{w}_{m,i} \\exp(-\\alpha y_i G_m^*(x_i))$  \n",
    "$\\displaystyle \\begin{array}{ll} \n",
    "\\because & \\sum \\bar{w}_{m,i} \\exp(-\\alpha y_i G_m^*(x_i)) \\\\\n",
    "=& \\displaystyle \\sum_{i \\in M_1} \\bar{w}_{m,i} \\exp(-\\alpha) + \\sum_{i \\in M_2} \\bar{w}_{m,i} \\exp(\\alpha) \\\\\n",
    "=& \\displaystyle (e^{\\alpha} - e^{-\\alpha}) \\sum_{i=1}^N \\bar{w}_{m,i} I(y_i \\neq G(x_i)) + e^{-\\alpha} \\sum_{i=1}^N \\bar{w}_{m,i}\n",
    "\\end{array} $  \n",
    "求解上式最小的$\\alpha$，则令上式求导等于0：  \n",
    "$\\displaystyle \\therefore (e^{\\alpha} + e^{-\\alpha}) \\sum_{i=1}^N \\bar{w}_{m,i} I(y_i \\neq G(x_i)) - e^{-\\alpha} \\sum_{i=1}^N \\bar{w}_{m,i} = 0$  \n",
    "$\\displaystyle \\therefore (e^{2 \\alpha} + 1) \\sum_{i=1}^N \\bar{w}_{m,i} I(y_i \\neq G(x_i)) = \\sum_{i=1}^N \\bar{w}_{m,i}$  \n",
    "$\\displaystyle \\therefore \\alpha_m^* = \\frac{1}{2} \\ln \\frac{\\sum \\bar{w}_{m,i} - \\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i)) }{\\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i)) } = \\frac{1}{2} \\ln \\frac{\\displaystyle 1- \\frac{\\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i) }{\\sum \\bar{w}_{m,i}}}{\\displaystyle  \\frac{\\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i) }{\\sum \\bar{w}_{m,i}}}$  \n",
    "令$\\displaystyle e_m = \\frac{\\sum \\bar{w}_{m,i} I(y_i \\neq G(x_i) }{\\sum \\bar{w}_{m,i}}$  \n",
    "则$\\displaystyle \\alpha_m^* = \\frac{1}{2} \\ln \\frac{1-e_m}{e_m}$  \n",
    "\n",
    "### 权重$\\bar{w}_{m,i}$的更新\n",
    "&emsp;&emsp;目前$G$和$\\alpha$都和AdaBoost算法是一致的，现在只差一个权重的更新。  \n",
    "之前令$\\bar{w}_{m,i} = \\exp\\big(-y_i f_{m-1}(x_i)\\big)$  \n",
    "$\\begin{aligned} \\bar{w}_{m,i} \n",
    "&= \\exp\\big(-y_i f_{m-1}(x_i)\\big) \\\\\n",
    "&= \\exp \\big( -y_i \\sum_{j=1}^{m-1} \\alpha_j G_j(x_i) \\big) \\\\\n",
    "&= \\prod_j \\exp \\big(-y_i \\alpha_j G_j(x_i) \\big)\n",
    "\\end{aligned}$  \n",
    "&emsp;&emsp;上述的公式与权值更新只相差一个规范化因子，因为每一轮所乘的规范化因子对每一个样本都是一样的，所以不影响求解结果，该值与AdaBoost算法也是一致的。  \n",
    "&emsp;&emsp;现在还差最后一个问题，因为之前讲的是通过数学归纳法推导的，只通过$m-1$步推出了$m$步，那么第1步是如何推导的呢？第1步的推导过程也可以根据书中的推导出来，唯一的区别就是在最小化损失函数$\\alpha_m,G_m = \\arg \\min \\sum \\exp(-y_i f_{m-1}(x_i) + \\alpha G(x_i))$，在第1步中$f_{m-1}(x_i)$是不存在的，可得$\\alpha_m,G_m = \\arg \\min \\sum \\exp(\\alpha G(x_i)) $，剩下的推导过程是一致的，得证."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
